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Abstract 

Recent advances in compiler technology have demonstrated 
the benefits of using strongly typed intermediate languages 
to compile richly typed source languages (e.g., ML). A type- 
preserving compiler can use types to guide advanced opti- 
mizations and to help generate provably secure mobile code. 
Types, unfortunately, are very hard to represent and manip- 
ulate efficiently; a naive implementation can easily add expo- 
nential overhead to the compilation and execution of a pro- 
gram. This paper describes our experience with implement- 
ing the FLINT typed intermediate language in the SML/NJ 
production compiler. We observe that a type-preserving 
compiler will not scale to handle large types unless all of 
its type-preserving stages preserve the asymptotic time and 
space usage in representing and manipulating types. We 
present a series of novel techniques for achieving this prop- 
erty and give empirical evidence of their effectiveness. 

1 Introduction 

Compilers for richly typed languages (e.g., ML [21]) have 
long used variants of the untyped A-calculus [2, 10] as their 
intermediate languages. An untyped compiler first type- 
checks the source program, and then translates the program 
to the intermediate language, discarding all the type infor- 
mation. Types are used to ensure that the program will not 
“go wrong” at run time, but they do not affect the rest of 
compilation and execution in any way. 

Recent advances in compiler technology have demon- 
strated many distinct advantages of using strongly typed 
intermediate languages to compile richly typed source lan- 
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guages. A type-preserving compiler type-checks the source 
program, but then translates both the program and the 
(inferred) type information into the intermediate language. 
The rest of the compiler can use types to guide advanced op- 
timizations [28, 16, 22, 34, 6] and to help generate provably 
secure mobile code [13, 26, 19, 23, 17]. The compiler can 
also propagate the type information into the target code to 
support sophisticated run-time type dispatches and garbage 
collection [14, 22, 34, 43]. 

Unfortunately, type information is very hard to repre- 
sent and manipulate efficiently, especially when the under- 
lying type system involves ML-like polymorphic types and 
module types [21]. A naive implementation can easily add 
exponential overhead to the compilation and execution of a 
program. For example, in the following ML program: 

fun f x = x 
fun toyO = 

let fun g y = ( < ( (f f) f) f) . . . f ) y 
in g 3 
end 

the identity function f has polymorphic type Va.a — t a. 
Suppose we apply f to itself n times as shown. According 
to the ML type inference algorithm [5], the rightmost f has 
type Ti = q — > a, while the leftmost f gets instantiated 
to T„ = T„_ i — » T n i . Clearly, representing T n as a tree- 
like structure would require 0(2") space, so a sufficiently 
small n (e.g., 30) would wreck the efficiency of the compiler. 
To avoid such exponential blowup, we must represent and 
manipulate T„ as a linear-sized dag: 



In fact, we must ensure that all type-related operations in 
the compiler (including those at run time if we pass types 
there) would handle such large types in the same way. For 
instance, in the let body above, when g is specialized to 
int— » int, we need to apply a substitution (from a to int) 
to all instances of Ti; clearly, we must traverse the dag in 
linear time and preserve its shape. 

Although the preceding example is a bit contrived, 1 large 

1 It is well known that ML type inference can take exponential time 
and space on certain kinds of ML programs [20]; the toy function 
defined here, however, does not belong to this category. An untyped 
compiler could compile the toy function without any problem. 
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For each type built during the compilation, we calculate the size of its tree representation 
and dag representation (in number of nodes). The ratio of these two shows the amount 
of savings of the dag representations. We then use the range of this ratio (2* to 2 ,+2 — 1, 
where i = 0,. . . ,4) to classify all the types; for each category, we list the number of its 
members, the average size of the tree representations, and the percentage of its total size 
over that of all categories. 


Figure 1: A profile of compile-time type information. 


types are ubiquitous in real-world ML applications. For ex- 
ample, a 200-line ML program cm.sml in the SML/NJ com- 
pilation manager (CM) [4] contains more than 36 functor 
applications and more than 80 structure references; each of 
these modules may contain a dag of sub-structures or func- 
tors. Figure 1 gives a profile of types built while compiling 
two large ML applications in our type-preserving compiler 
(see later sections for details about the compiler). Here, CM 
is the compilation manager [4] and eXene is an ML-based X 
window system tool-kit [31]. If we use tree representations, 
a single type can contain more than 45,518 nodes for CM 
and 379,315 nodes for eXene. These large types can be dra- 
matically reduced in size if we use dag representations. For 
example, under CM, 32.03% of the space occupied by types 
can be improved by a factor of at least 64 when we use dag 
representations. For eXene, the savings are even more dra- 
matic. Of course, no real compiler will use the dumb tree 
representations all the time, but the profile does show that 
any loss of sharing in type representations could potentially 
incur huge costs to the compilation time and space usage. 

This paper describes our experience with implementing 
the FLINT typed intermediate language [36] in the SML/NJ 
production compiler [39, 3]. FLINT is based on a pred- 
icative variant of the polymorphic A-calculus F u [12, 32, 
15], extended with a rich set of primitive types and func- 
tions. FLINT supports both polymorphic types and higher- 
order type constructors, so the type language itself is a full- 
scale A-calculus. To support various type-directed optimiza- 
tions [14, 34], we perform a large number of type-related op- 
erations during compilation. The main challenge is to repre- 
sent complex FLINT types (which can be arbitrary lambda 
terms) as compact dags so that common type-related oper- 
ations (e.g., lambda reductions, equality) can always work 
efficiently and yet still preserve sharing. 

More generally, we believe that a type-preserving com- 
piler will not scale to handle large types unless all of its 
type-preserving stages can preserve the asymptotic time and 
space usage in representing and manipulating types. To 
achieve this property, we present a novel and efficient rep- 
resentation scheme for the FLINT type calculus. Our main 
idea is to combine hash-consing, memoization, and advanced 
lambda encoding [24, 1, 9] to ensure that (1) types are al- 
ways represented as dags; (2) type reductions are done on a 
by-need basis; and (3) the cost of handling types is propor- 
tional to the size of the dag representations. In a companion 


paper [33], we have presented a new optimal type-lifting al- 
gorithm that lifts all run-time type constructions to the top 
level; in fact, we can guarantee that the number of types 
built at run time is a compile-time constant; furthermore, 
all of them are represented as efficiently as their compile- 
time counterparts. 

The main contributions of this paper are: 

• As far as we know, our work is the first comprehen- 
sive study on how to build scalable implementations of 
type-preserving production compilers. Several existing 
compilers [41, 27, 42] have also used typed intermedi- 
ate languages, but none of them have attempted to 
scale their implementations to handle large types; in 
fact, all these compilers have reported extremely slow 
compilation times as a result of keeping types during 
compilation. 2 

• We combine hash-consing, memoization, and advanced 
lambda encoding [24, 1, 9] to support efficient type rep- 
resentation and manipulation. Although each of these 
techniques has been researched and implemented be- 
fore, nobody has ever tried to combine them to repre- 
sent compiler type information. Combining these tech- 
niques is crucial yet non-trivial, as we will demonstrate 
in Section 5 and Section 7. 

• We describe several different ways of representing type 
variables bound in the term languages and then com- 
pare their performance. Representing type variables as 
de Bruijn indices is faster but it also makes type ma- 
nipulation harder. We show that using explicit names 
to represent type variables might be a more desirable 
alternative. 

• All techniques discussed in this paper have been imple- 
mented and incorporated into the SML/NJ production 
compiler since version 109.24 (January 1997). The re- 
sulting compiler has been used and tested world-wide 
on a large number of ML applications for more than 14 
months. We have not received any complaints about 
the compilation time after we switched to the type- 
preserving implementation. We are not aware of any 

2 Although GHC makes little use of its type information in the back 
end, it still runs out of memory when compiling the toy benchmark. 
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other type-preserving ML compilers that can handle 
large applications such as CM and eXene. 

• To verify the effectiveness of these techniques, we 
have measured and compared several versions of the 
SML/NJ compiler on a variety of benchmark pro- 
grams. The combination of these techniques can re- 
duce the total compilation time by up to 72% on large 
applications (a reduction of 93% in the type-preserving 
phases). 

• We also present a detailed comparison between our 
scheme and the lettype scheme used in the TIL/ML 
compiler (also informally described in Tarditi’s the- 
sis [40]). 


2 Related Work 


(kind) k 
( tycon ) p 
(type) a 
(term) e 


fi | Kl — » Ko 

t | Int | ytil — > yU2 I A t'.'.K.p | /Ul[yU2] 
T(p) <Tl — » (To | yt-.-.K.a 
i | x | \x : a.e \ @xiX2 
A tr.K.e | x[p] | let x = ei in eo 


Figure 2: Syntax of the Core-FLINT calculus. 


address issues such as compilation time in the face of real- 
world applications. Finally, lettype has poorly understood 
theoretical foundations; we do not know of any notion of nor- 
mal form 3 in the context of lettype, for example. A more 
detailed comparison between our scheme and the lettype 
scheme is given in Section 8. 


Typed intermediate languages have received much attention 
lately, especially in the HOT (higher-order and typed) lan- 
guage community. However, recent work [14, 22, 36, 30, 8, 
29, 23] has mostly focused on the theoretical foundations or 
other language design issues. This paper complements pre- 
vious work by showing that typed intermediate languages 
can indeed have practical and scalable implementations, but 
only if extreme care is taken. In fact, most of the tech- 
niques described in this paper have been incorporated into 
the SML/NJ production compiler since version 109.24 (Jan- 
uary 1997). Many results reported here are inspired by feed- 
back from the SML/NJ user community. 

Several existing compilers such as TIL [41], GHC [27], 
and ML-Kit [42] have also used an FL-like calculus as their 
typed intermediate languages. However, none of them has 
seriously addressed the problem of how to handle large types, 
nor do they support efficient run-time type passing. 

The suspension-based lambda encoding used in our im- 
plementation is directly borrowed from Nadathur’s recent 
work on efficient lambda representations [25, 24]. In addi- 
tion to doing an in-depth theoretical study of the underlying 
encoding calculus, Nadathur [24] has also used his encoding 
to implement the A-Prolog system. The main contribution 
of our work is to combine Nadathur’s encoding with hash- 
consing and memoization, and then apply it to the context of 
typed intermediate languages. Combining these techniques 
is non-trivial because of the presence of higher-order types 
and the need to memoize intermediate reduction results. 

Explicit substitutions [9, 1] is another related lambda 
encoding scheme. Cardelli’s Quest compiler [1] contains an 
implementation of this encoding; however, he did not com- 
bine it with other techniques we used. Nor was he working 
in the context of type-preserving compilers. 

Shao and Appel [39] used hash-consing to enforce dag 
representations for types; however, their intermediate lan- 
guage is only monomorphically typed, so it is much easier 
to support than FLINT-like languages. Tarditi [40] used the 
lettype constructs (in both the constructor calculus and 
the term language) to A-normalize [10] all types in order to 
express sharing explicitly. But he relies on a separate com- 
mon sub-expression elimination phase to identify the sharing 
information. This amounts to hash-consing with the disad- 
vantage that it comes too late (huge redundant types have 
already been built) and it does not guarantee that further 
redundancies will not be introduced later in the compila- 
tion process. So it is not clear that the lettype scheme will 


3 An Overview of FLINT 

The core language of FLINT is based on a predicative vari- 
ant of the Girard- Reynolds polymorphic A-calculus FL [12, 
32], with the term language written in A-normal form [10]. 
It contains the following four syntactic classes: kinds (k), 
type constructors (p), types (<r), terms (e), as shown in 
Figure 2. Here, kinds classify type constructors, and types 
classify terms. Constructors of kind fi name monotypes. 
The monotypes are generated from variables, from Int, and 
through the — » constructor. As in Fu,, the application and 
abstraction constructors (i.e. , pi[p2] and A t :: K.p) corre- 
spond to the function kind — » ko- Types in Core-FLINT 

include the monotypes, and are closed under function spaces 
and polymorphic quantification. We use T(p) to denote the 
type corresponding to the constructor p when p is of kind Q. 
As in Fa,, the term language is an explicitly typed polymor- 
phic A-calculus (but written in A-normal form); both type 
abstraction and type application are explicit. 

The actual FLINT language contains other familiar con- 
structs such as record, recursive datatype, and a rich set 
of primitive types and operators. Large types mainly come 
from ML-style modules (which are represented as FLINT 
records [37]) and recursive datatypes, but the challenge of 
implementing FLINT still lies on how we handle three forms 
of type abstractions, i.e., constructor function (At :: K.p), 
polymorphic type (Vt :: k.o), and polymorphic function (At :: 
K.e). We present our solutions in Sections 5 and 6. 

The structure of our type-preserving compiler is very 
similar to that of conventional untyped compilers. Programs 
written in the source languages (e.g., ML) are first fed into a 
language-specific front end which does parsing, elaboration, 
type-checking, and pattern-match compilation; the source 
program is then translated into the FLINT typed intermedi- 
ate format. The middle end does conventional dataflow op- 
timizations, type specializations, and A-calculus-based con- 
tractions and reductions, producing an optimized version of 
the FLINT code. The back end compiles FLINT into ma- 
chine code through the usual phases such as representation 
analysis [34], safe-for-space closure conversion [38], register 
allocation, instruction scheduling, and machine-code gener- 
ation [11]. 

3 Of course, we could always expand out the lettype definitions to 
get to a normal form, but this eliminates the benefit of using lettype 
and is equivalent to using tree representations. 
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signature LTYEXTERN = sig 
(* abstract types *) 


type tkind 
type tyc 


(* K 

(* fi 


type lty (* 

7 *) 




constructors *) 





val tcc_int 

tyc 


(* 

Int *) 

val tcc_var 

tvar 

-> tyc 

(* 

t *) 

val tcc_arrow 

tyc 

* tyc -> 

tyc (* 

/i -► fi *) 

val tcc_fn 

tkind * tyc 

-> tyc (* 

A K.fi *) 

val tcc_app 

tyc 

* tyc -> 

tyc (* y[y] *) 

selectors *) 





val tcd_var 

tyc 

-> tvar 



val tcd_arrow 

tyc 

-> tyc * 

tyc 


val tcd_fn 

tyc 

-> tkind 

* tyc 


val tcd_app 

tyc 

-> tyc * 

tyc 


predicates *) 





val tcp_int 

tyc 

-> bool 



val tcp_var 

tyc 

-> bool 



val tcp_arrow 

tyc 

-> bool 



val tcp_fn 

tyc 

-> bool 



val tcp.app 

tyc 

-> bool 




(* utility functions *) 
val tc_eqv : tyc 

val tc.print : tyc 


* tyc -> bool 
-> string 


end (* LTYEXTERN *) 


Figure 3: Interface to the FLINT constructor language (/i). 
Some constructor forms are omitted. Similar interfaces exist 
for FLINT types (a) and kinds (k). 


4 Implementation Criteria 

In this section, we list the goals that guided the implemen- 
tation of the FLINT type language, and we describe its 
interface. We present the implementation details in Sec- 
tion 5. The following criteria are, in our experience, impor- 
tant for an efficient implementation of a typed intermediate 
language. 

Compact space usage. As demonstrated in Section 1, 
large types are ubiquitous in real-world ML applications. 
For this reason, it is imperative that we represent these types 
efficiently. Fortunately, large types are also highly redun- 
dant, so a well-constructed dag representation can be quite 
compact. The representation should, however, come with 
either a guarantee that all such redundancy is exploited, or 
empirical evidence showing that, in practice, types remain 
compact even as they are manipulated and transformed. 

Linear-time traversal of types. Compact representations 
are not enough to ensure efficiency in a type-preserving com- 
piler. Many operations on types (e.g., substitution and re- 
duction) require traversing the graph. If we are not careful, 
these operations might traverse isomorphic subgraphs mul- 
tiple times, even though they share the same representation. 
In order to maintain reasonable compilation time, such op- 
erations must traverse the representation linearly. 

Fast equality. Checking the equivalence of two types can 
be non-trivial, because there are many ways to represent 
the same type. Equality checking is often used by type- 
directed optimizations. For example, representation analy- 
sis [34] uses equality to determine where wrapping is neces- 
sary. Moreover, two compelling operations made possible by 
type-preserving compilation perform equality tests repeat- 


edly: type-checking intermediate phases and certifying ob- 
ject code [26, 23]. Thus, the implementation should support 
efficient equality tests. 

Simple Interface. The software engineering benefits of 
hiding implementation details from clients are widely recog- 
nized. Besides concealing the tricks used to meet the other 
criteria, we would like clients to treat each type as its inten- 
sion; different representations of a single type should all look 
the same. There are two ways to achieve such an interface. 
Either all types passed across the interface are in normal 
form (corresponding to eager reduction), or the top node of 
a type is the same as for the normalized version ( weak head 
normal form), and the rest is normalized on demand. 

Our implementation meets all of these goals. We guaran- 
tee sharing by hash-consing and storing each type in a global 
table. Isomorphic types always share their representation, 
regardless where they appear or how they were constructed. 
All reductions and substitutions are guaranteed to traverse 
the representation linearly (because these important oper- 
ations are specifically supported by the implementation). 
For clients implementing other transformations, we provide 
a memoizing fold function that is guaranteed to traverse the 
representation linearly. 

For equality testing, thanks to our guarantee that iso- 
morphic types share the same representation, types in nor- 
mal form can be compared very quickly using pointer equal- 
ity. If the types are not in normal form and pointer equality 
fails, then we reduce the types to weak head normal form , 
check if the heads have the same shape, and continue recur- 
sively on the sub-terms. In practice, this leads to very cheap 
equality tests. Complete type-checking of the intermediate 
code after every phase does not incur noticeable overhead. 

Figure 3 gives part of the FLINT type language interface. 
All operations on FLINT types (/r) are done through a set 
of basic primitives: constructor functions create types from 
their components; predicates test for particular constructs; 
selectors project the components, assuming an appropriate 
construct is given. Additionally, the interface contains func- 
tions for equivalence testing, pretty-printing, etc. 

The interface behaves as if all types are kept in normal 
(fully reduced) form, even though the underlying implemen- 
tation uses lazy reduction. For example, suppose we create 
a type t by applying the identity function to Int. Then, 
tcp_app(t) will return false, whereas tcp_int(t) will re- 
turn true. 


5 Representing Types 

Now we turn to a detailed explanation of our implementa- 
tion techniques. We show how to represent complex FLINT 
types as compact dags and make the costs of all type-related 
operations (e.g., substitution, equality) proportional to the 
dag size. We will focus our discussion on the FLINT con- 
structors (n) only, though these techniques apply to the 
FLINT types (<r) and kinds (k) as well. In fact, the issues in- 
volved in implementing polymorphic types (Vi :: k.o) are pre- 
cisely same as those for higher-order constructors (Af ::k./l). 

5.1 Suspension-based lambda encoding 

The first challenge in representing FLINT constructors is 
to choose an appropriate encoding for efficient manipula- 
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(rl) (Ak.jUi)[jU 2 ] => Env(/Ji, (1, 0, (/i 2 , 0) •.■.nil)) 

(r2) Env(yu, (0, 0, nil)) => p 

( r3 ) Env(/r, (i, j, env)) ==> p if pis closed, i.e. , it has no free type variables) 

{r4) En v(#n,(i,j,env)) => #(n - i + j) if n > i 

( r5 ) Env(^t?r, ( i,j , env)) => #(j — j') if n < i and the n-th element of env is j' 

( r6 ) En v(#n, ( i,j , env)) => Env(p, (0, j — j' ,nil)) if n < i and the n-th element of env is (/, p) 

(r7) Env(lnt.p) => Int 

(r£) Env(yUi — > po, p) => Env(/ti, p) -> Env(/t 2 , p) 

(r5) Env(yUi[p 2 ],p) => (Env(^i, p))[Env(yU 2 , p)\ 

( rlO ) Env(A«.yU, (i, j, env)) => A«.Env(/t, (i + 1, j + 1, j :: env)) 

(rll) Env(Env(/i. (j, j, env)), (0, j' , nil)) => Env(p, (i, j + j', env)) 

Figure 4: Type reductions under suspension-based lambda encoding. 


tion. Under the syntax in Figure 2, testing the equality 
of a-convertible constructors such as pi = Xti :: tl.t i — > 1 1 
and po = A to :: fl.to — > to is non-trivial. We use de Bruijn 
indices [7] to represent type variables, so that a-equivalent 
constructors always have the same representation. For ex- 
ample, both ytii and yu 2 are represented as Afi.(#l — ¥ #1). 
The A no longer binds any named type variables (though 
the kind is still retained). Instead, we use a positive integer 
to denote the variable bound by the nth surrounding 
A-binder. 

Another important requirement is that type reduction 
should be done lazily. To achieve this, we enrich the con- 
structor calculus to support a new suspension term [25, 24] 
of the form En v(/i,p). Intuitively, a suspension represents 
an unevaluated type “p(yu)”; it corresponds to the interme- 
diate result of some unevaluated type applications. The 
substitutions involved ( p ) are also known as explicit sub- 
stitutions [1, 9]: 

( constructor ) p ::= | Int | yUi — » yu 2 

| An.yU | yUl[p 2 ] I Env(yu,p) 
[substitution) p ::= ( i,j,env ) 

( environment ) env ::= nil \ jv.env \ (j,p)v.env 

Following Nadathur [24, 25], we represent each such substi- 
tution as a triple (i,j,env) where the first index i indicates 
the current embedding level of bound type variables, the 
second index j indicates its new embedding level, and the 
environment env contains the actual bindings of all i bound 
variables. Each entry in the environment is represented as 
a pair (/,//) or as an integer j 1 (which has same meaning 
as (/> # 0)); in either case, j' denotes the definitional depth 
of type yu'. Figure 5 shows the relationship among these 
components for a type Env(ytt, (i,j, env)), assuming all envi- 
ronment entries of form j' are represented as (j ; ,#0), and 
the environment env is equal to (ji, yUi) (ji, pi) ::nil. 

For example, the standard /3-contraction (AK.yUi)[/i 2 ] re- 
sults in a constructor of the form Env(ytii, po) where po = 
(1, 0, (0, po ) :: nil). This represents the following fact: the 
constructor pi, which was originally in the scope of 1 ab- 
straction, is now to be thought of as being in the scope of 
none; yu 2 , originally in the scope of 0 abstractions, is to be 
substituted for the first free variable in pi. 

Figure 4 gives the set of type reductions used in our A en- 
coding. Here, Rule (rl) turns a type application into the sus- 
pension form. Rules ( r2 ) and (r3) are two straightforward 
optimizations, capturing the fact that applying an empty 
substitution to a type or applying a substitution to a closed 



Figure 5: Illustration of a suspension type 


type should have zero effect. Because we memoize the set of 
free variables in our type representations (see Section 5.2), 
it is very easy to check whether a type is closed. Rules (rl,) 
to (r6) show how we adjust and substitute each de Bruijn 
index-based type variable: for variables that are bound out- 
side the current binding level (j), the new de Bruijn index 
would be (n — i) +j; for variables bound in the current envi- 
ronment, we find out its corresponding mapping and adjust 
the result from its definitional level j' to the new embedding 
level j. Rules (r7) to ( rlO ) push the substitution recursively 
into the subterm of each type; for type functions (A n.p), we 
need to add a new entry into the current environment (see 
Rule ( rlO )). Rule (rll) is a simple optimization to merge 
two nested substitutions. Notice that because all intermedi- 
ate results are expressible in our calculus, the reduction rules 
do not involve any external substitution machinery. More 
details about the suspension-based calculus can be found in 
Nadathur’s excellent paper [24]. 

5.2 Hash-consing and memoization 

After we choose the appropriate encoding scheme, we hash- 
cons all FLINT kinds, type constructors (including substi- 
tutions), and types into three separate hash tables. Under 
hash-consing, all FLINT types built during the compilation 
are guaranteed to use the most compact dag representation. 
Because we are using de Bruijn notation, type variables are 
represented as integers and all a-convertible types have iden- 
tical representations, which allow them to be collapsed via 
hash-consing. 

For each hash entry, we use weak pointers so that if an 
element in the hash table is no longer used anywhere else, 
it will be garbage collected. Internally, each constructor 
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/j is now accessed indirectly via an updateable hash cell, 4 5 
denoted as //: 


( hash-cell ) yu 3 

( constructor ) yu 


Ref (hashcode, fi, auxinfo) 
#n | Int | yuf -A /i| 

\K 3 .fj , 3 | 

Env(//%p s ) | Ind(yuf,yti 2 ) 


and once it is done, the result will be memoized for future 
use. Our measurements have shown that these techniques 
reduce the compile time of large applications by an average 
of 45% (see Section 7.2). 

6 Manipulating Types 


Here, a hash cell is a mutable record containing the follow- 
ing three fields: an integer hash code (hashcode), a term 
(yu), and a set of auxiliary information (auxinfo). The aux- 
info maintains two attributes: a flag that shows whether yu 
is already in normal form 6 and if so, the set of free type 
variables in yu (in de Bruijn indices, 6 of course). Building a 
new constructor under this representation takes two steps: 
(1) calculate the hash code, and (2) if the constructor is not 
already in the hash table, calculate the auxinfo and insert 
the new cell. 

The most interesting aspect of our representation scheme 
is that we can also memoize the result of every sequence of 
type reductions (e.g., those in Figure 4). Given a constructor 
yu 3 = Ref (hashcode , yu, auxinfo), suppose yu can be reduced 
to ytii; then, we can do an in-place update, changing the 
second field of // to a memoization node Ind(yu,yUi): 


he 


(J. MIX 


(before reduction ) 



We keep the original yu in the new memoization node so 
that all future creations of yu (which will always have the 
same hash code) will be directly hash-consed to this new 
memoization node. The hashing procedure might require 
checking syntactic equality against yu because of potential 
hashing conflicts. 

Note that the update is always safe, because it is only 
done to constructors that are not in normal form, so we do 
not have to recalculate the free variables, etc. 

Memoization of reduction results has very interesting 
consequences: if we do not garbage-collect any of these mem- 
oization nodes (we may since they are weak pointers), then 
any redex of form /a can reuse the memoized result, fi 3 1 . This 
leads to a very practical implementation that approximates 
optimal lambda reductions [18], with the caveat of using 
hash-consing, of course. 

The combination of these techniques has proven to be 
very effective. With hash-consing and memoization, com- 
mon operations such as equality tests, testing if a type is in 
normal form, and finding out the set of free variables, can 
all be done in constant time. With the use of suspension 
terms, type application is always done on a by-need basis, 

4 Hash-consed substitutions ip s ) and kinds {k s ) are represented 
in the same way. Actually, because substitutions are simply finite 
mapping from de Bruijn indices to constructors, they share the same 
hash table with type constructors (we could simply encode them as a 
record constructor). 

5 By normal form, we mean those constructors that do not con- 
tain any redexes, i.e., no sub-term matches the left-hand side of the 
reduction rules in Figure 4. 

6 If we use named variables to represent the type abstraction in 
the term language (see Section 6), we would need to maintain two 
separate lists of free variables, one using de-Bruijn indices, another 
using named variables. 


Although we use hash-consing, memoization, and the sus- 
pension-based lambda encoding to support efficient type 
handling, none of these implementation details are exposed 
to the clients of the type interface (see Figure 3). In fact, ma- 
nipulating types under our type interface is still much like 
manipulating simple datatype-based representations. The 
only thing we have lost is the pattern matching capability. 

Our interface also treats each type as its intension, that 
is, clients never need to think whether or not a type should 
be represented in normal form (or weak-head normal form). 
All operations in the type interface can apply to types of 
any form. Type reductions are completely hidden inside the 
underlying implementation and they are always done lazily. 

Because of the various memoizations we do, our type in- 
terface also provides unusually fast implementations of sev- 
eral common operations. For example, we can check if a 
type is in normal form in constant time; we can also find 
the set of free variables in a type in constant time as well. 

The only remaining issue is on how to represent type 
variables bound by polymorphic functions in the term lan- 
guage (i.e., At :: K.e). For a long time (including our most 
recent release), we have used the same de Bruijn indices to 
represent these type variables. This strategy requires no 
changes to the existing interface, but it has the unfortunate 
effect that the representation of a type annotation with free 
variables is now dependent on its lexical depth (the number 
of type abstractions under which it appears). The implica- 
tion is that the client must adjust the representation when 
moving types from one depth to another. 

Although we provide several utility functions to support 
this operation, having de Bruijn indices exposed does com- 
plicate certain optimization phases. Inlining, for example, 
requires adjusting types if the definition and call site are at 
different lexical depths. Specialization requires particularly 
drastic (yet subtle) adjustments to the types, since type ab- 
stractions themselves are being inlined and even eliminated. 

We have experimented with an alternate design which 
hides the de Bruijn indices by supporting two different rep- 
resentations. Inside the type language, the type function 
(A) and the polymorphic quantifier (V) still bind de Bruijn- 
indexed type variables. In the term language, however, type 
abstraction (A) binds named variables. This way, type an- 
notations can be moved freely across depths because all free 
type variables are guaranteed to be named. 

Naturally, this simplicity has a price. First, in order to 
reconstruct the type of a A term, we must traverse the types, 
converting the named variables into de Bruijn indices before 
placing the quantifier in front. Second, to memoize the set 
of free variables in each type representation, we now need 
to maintain two separate lists of type variables, one using 
de-Bruijn indices, another using named variables. Third, 
a-equivalent named types will not share the same represen- 
tation. Our intuition, however, is that the additional cost 
for A-bound type variables will be acceptable, because these 
represent a very small portion of the total type size. In Sec- 
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Benchmark 

Source 

Lines 

Program 

Description 

Code Size 

(bytes) 

Tree Size 

(nodes) 

Dag Size 

(nodes) 

simple 

918 

A spherical fluid-dynamics program 

114,944 

34,118 

1,913 

vliw 

3,682 

A VLIW instruction scheduler 

273,836 

646,215 

5,682 

sml-nj 

89,432 

SML/NJ compiler vl09.32 

6,779,308 

20,749,395 

125,044 

CM 

7,703 

SML/NJ Compilation Manager by Blume 

487,048 

3,186,279 

27,968 

cml 

5,966 

Concurrent ML by Reppy 

366,684 

1,203,391 

17,106 

eXene 

35,662 

An X-window system by Reppy & Gansner 

2,291,628 

99,567,031 

78,671 

ml-lex 

1,232 

A lexical-analyzer generator 

103,604 

112,091 

3,122 

toy 

7 

Identity function applied 18 times 

22,148 

30,409,149 

14 

toyp 

8 

Similar, with curried application 

30,016 

183,463,919 

743 


Figure 6: Description of benchmarks used. The tree size expresses the number of nodes in the type forest, if types were 
represented as trees (with no sharing of any kind). The dag size is the number of nodes actually created to represent the types 
in the compiler. The comparison between tree size and dag size is only intended to demonstrate the amount of redundancy 
in the types. 


tion 7.3, we give preliminary measurements indicating that 
the additional costs are indeed acceptable. We conclude that 
this simpler design (using named variables) is quite feasible 
and will most likely be used in future versions of the com- 
piler. 

7 Experimental Results 

This section gives empirical evidence demonstrating the ef- 
fectiveness of the techniques presented in Sections 5 and 6. 
All techniques have been implemented in the FLINT/ML 
compiler [35] and in the SML/NJ production compiler since 
version 109.24 (January 9, 1997). All tests were performed 
on a Pentium Pro 200 Linux workstation with 64M physical 
RAM. 

Figure 6 shows the set of benchmarks we used along with 
a summary of their salient features, including the size of the 
types. The dag size is the number of nodes when maximal 
sharing is realized, meaning that even a-equivalent types 
share the same representation. The ratio of tree size to dag 
size is intended to demonstrate the amount of redundancy 
in the types; it is not meant as a comparison of our repre- 
sentation to the completely naive one. 

7.1 Hash-consing results 

This redundancy is examined for several benchmarks in Fig- 
ure 7. Here, the y - axis represents some proportion of the 
type forest, while the s-axis shows the minimum reduction 
factor realized on that proportion of the forest thanks to 
hash-consing with de Bruijn indices. 

The results for VLIW are particularly interesting. VLIW 
is written in an algorithmic style, making little use of higher- 
order functions, functors, or polymorphism. Nevertheless, 
we still get considerable reduction of the types used in the in- 
termediate representation. This shows that hash-consing is 
not only beneficial for heavily functorized applications such 
as eXene and CM. 

In order to get an idea of the cost of hash-consing, we 
measured the performance of our hash table. The table is 
an array containing 2,048 lists; collisions are handled by 
prepending new entries onto the list. Subscripting the array 



Ratio of tree size to dag size 

Figure 7: Amount of redundancy in types. The a-axis repre- 
sents the minimum reduction factor realizable on some pro- 
portion of the type forest. For instance, with mi-lex, 80% of 
the type forest can be cut at least in half, and 25% can be 
reduced by a factor of 8. 


is very fast, so we need only be concerned with the cost of 
traversing the lists. Figure 8 shows the dynamic distribution 
of the lengths of list traversals. Most queries are satisfied 
after looking at only one or two list entries. One of the 
reasons is locality; we place new entries at the head of the 
list, so subsequent accesses are immediate. Furthermore, 
the table never gets very big; the maximum length of a list 
is 12. It seems clear that we should not be concerned about 
the performance of the hash table. 

7.2 Memoization results 

Figure 9 summarizes the results of doing various combina- 
tions of memoizations. The y-axis represents the compila- 
tion time of each benchmark, relative to the CPU time with- 
out any memoizations (the absolute time is printed above 
each set of bars). The memoizations performed are a normal- 
form indicator (NF), the set of free variables (FV), interme- 
diate reduction results (RD), and combinations of these. 
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9.35 34.88 924.44 58.96 52.36 723.43 80.65 174.5 



simple vliw sml-nj CM cml eXene toy toyp 


■ NF 

□ FV 

■ RD 

□ NF+FV 

■ NF+RD 

□ FV+RD 

□ All 


Figure 9: Memoization results. This shows the compilation times for each benchmark using various combinations of memo- 
izations, relative to the time without memoizations. The striped part of each bar represents the type-preserving phases of the 
compilation. The results for ml-lex are very similar to those for vliw; they were omitted due to space constraints. 



Length of list traversal 

Figure 8: Hash table performance. This shows the dynamic 
number of hash table lookups (y- axis) that must search a 
bucket of particular length (x-axis). Most queries are satis- 
fied after looking at only one or two list entries. 


The striped part of each bar represents the type-pre- 
serving phases of the compilation, where our memoizations 
should have the most effect. Variation in the rest of the 
compilation time (represented by the solid bars) can be at- 
tributed to measurement error and secondary effects. Notice 
that, without memoizations, the type-preserving phases rep- 
resent a significant portion of the compilation time, even on 
Simple and VLIW (18 and 27%, respectively). 

The eXene benchmark is a large, heavily-functorized ap- 


plication on which our techniques are particularly effective. 
They reduce the total compilation time of eXene by 72% (a 
reduction of 93% in the type-preserving phases). Without 
memoizations, the type-preserving phases represent a dom- 
inant 78% of the compilation time: with them, these phases 
are a manageable 25%. Taking the average over the large 
benchmarks (sml-nj, CM, cml, and eXene), our techniques 
reduce the total compilation time by 45%. 

In most benchmarks, memoizing NF+FV+RD does not 
seem to win much over just NF+FV. Also, in most cases, 
FV+RD achieves results somewhat similar to just FV. One 
might be tempted to assume that the other memoizations 
effectively subsume memoizing reduction results. However, 
there are extreme cases (toy and toyp, for example) where 
RD does improve compilation time when combined with 
other memoizations. These programs contain huge poly- 
morphic types that are later specialized because they are 
only applied to integers. Without memoization of reduction 
results, specialization blows up. 

7.3 Named variable results 

Finally, we give preliminary measurements of the cost of 
using named type variables in the term language. As dis- 
cussed in Section 6, the phases of the current compiler that 
are most inconvenienced by de Bruijn indices are inlining 
and specialization. 

We added support in our type interface for named vari- 
ables, and changed the FLINT representation to use them 
behind type abstractions. Next, we modified all compiler 
phases through specialization to use the named variables. 
The modifications were fairly painless; deleting the most 
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Benchmark 

Compilation Time 

(seconds) 

Ratio 

deBruijn 

namedvar 

simple 

8.53 

8.44 

0.99 

vliw 

28.78 

28.37 

0.99 

sml-nj 

566.96 

565.91 

1.00 

CM 

57.83 

63.63 

1.10 

cml 

105.17 

108.54 

1.03 

eXene 

188.46 

188.55 

1.00 

ml-Iex 

8.19 

8.33 

1.02 

toy 

0.10 

0.11 

1.10 

toyp 

0.21 

0.21 

1.00 


Figure 10: Named variable results. This shows compilation 
times for each benchmark using de Bruijn indices through- 
out, and then using named variables in the term language 
(and converting to de Bruijn indices after specialization). 
The last column gives the ratio of the second version over 
the first. 


subtle parts of the specialization code was downright enjoy- 
able. We have not yet modified the later phases. Instead, we 
temporarily inserted a phase after specialization to convert 
all remaining named type variables into de Bruijn indices. 
The cost incurred by this extra phase is included in the mea- 
surements given in Figure 10. 

The compilation times of most benchmarks are not no- 
ticeably affected by the change. CM, the primary excep- 
tion, suffered a 10% increase in compilation time due to the 
use of named variables. These results are preliminary be- 
cause we have yet to modify the later phases of the compiler 
to use the new mixed representation (which would obviate 
the need for the extra conversion phase). We suspect that 
the remaining modifications will have no serious impact on 
performance, and with additional profiling and tuning, we 
may even be able to reduce the current overhead. We con- 
clude that the simplified interface (made possible by using 
de Bruijn-indexed type variables internally and named vari- 
ables externally) is quite feasible and will most likely be used 
in future versions of the compiler. 

8 Comparison 

Returning to our implementation criteria, we can certainly 
say that our scheme is very effective at representing types in 
a concise manner and provides us with a fast type equality 
test. Type manipulations are also made efficient by system- 
atic use of memoization. Our experience with the interface 
is very positive since all the machinery is well hidden within 
a few core modules which export simple and intuitive type 
operations. There are nonetheless a few weaknesses: 

• The interface hides the actual implementation behind 
functions which prevent the use of the pattern match- 
ing facilities of ML. This could be circumvented if re- 
ally necessary, but it turned out to be a non-issue. 
Furthermore, the functional interface gives us a lot of 
flexibility. 

• The de Bruijn indices make some manipulations more 
subtle than we would like. By using a mix of named 


variables and de Bruijn indices, we are able to simplify 
such manipulations outside the core modules and still 
achieve acceptable performance. 

• In order to make sure type traversals are efficient, we 
have to use a fold function on types which encapsulates 
the memoization. Here also, our experience has shown 
that it is not a serious issue. 

Our choice of techniques to provide efficient type ma- 
nipulation should be contrasted with the lettype scheme 
used in the TIL compiler [40]. It should be noted here that 
very little has been published about the lettype scheme, so 
this comparison is based on our own understanding of what 
lettype could look like under the ideal scenario rather than 
any existing implementation such as the one in the TIL com- 
piler. 

The basic approach is to extend the notion of A-normal 
form to types by providing lettype (in both the term and 
the type languages). For example, the identity function on 
integer pairs would look like: 

lettype ti = Int * Int 
in lettype £2 = <1 — > fi 
in (Xx : ti.x) : to 

This has the advantage of making the sharing explicit since 
all types are referenced through names bound in the type 
environment. The explicit sharing basically eliminates the 
risk of accidentally traversing the type tree in an inefficient 
way. In other words, type traversals get memoized for free. 
But lettype suffers from many problems: 

• There is no known way to define a compact normal 
form for such type representations. This implies that 
type equality tests become much more expensive. All 
existing theoretical framework treats lettype f = /r 1 
in ytt 2 as if it is a /3-reduction of form (A t ::«./ Z 2 ) [yui ] - 
This would clearly expand into a normal form, but on 
the other hand, this reduction is precisely one that 
is banned by the lettype scheme, as otherwise, type 
expressions would degenerate into inefficient tree rep- 
resentations. 

• Similarly it is unclear how one could provide a clean 
interface that allows its clients to be oblivious to nor- 
malization issues while still ensuring efficient execu- 
tion, since memoizing the normalization steps would 
require adding types to the environment which in turn 
would force the rewrite of the whole term. 

• Expressing sharing is not enough: we still first need 
to find that sharing. We might be able to get some 
sharing information straight from the type-inference 
phase, but this will require careful coding. Also we 
might not get as much sharing as we would want. 
TIL’s solution is to go through a common sub-expres- 
sion elimination phase. This would indeed allow us 
to merge all the common types, but requires precisely 
the same machinery as hash-consing and is done after 
the fact, whereas we are careful to eliminate common 
sub-expressions as soon as they appear. Furthermore, 
many more common sub-expressions will appear dur- 
ing the compilation process which will require addi- 
tional passes through the GSE phase while our scheme 
takes advantage of the hash-consing all along the com- 
pilation process to guarantee that sharing is constantly 
maintained. 
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• Another subtle difference is that lettype traverses its 
types most naturally in a bottom-up fashion which 
precludes (or rather reduces the effectiveness of) op- 
timizations that cut-off the traversal of types. More 
specifically, lettype would not let us make as good a 
use of information such as free-variables or a normal- 
form bit. 

To summarize, lettype seems to provide a clean way to 
represent types efficiently, but it ends up having to pay the 
cost of hash-consing anyway without reaping all the benefits 
of our more straightforward scheme. Also it is yet to be seen 
how lettype scales to real world situations such as eXene, 
which our scheme handles easily. 

Our approach manages to hide most if not all the com- 
plexity of type manipulation, providing programmers with a 
simple and intuitive interface. It ensures that maintenance 
of type information is non-intrusive, which is greatly ap- 
preciated for optimization phases that do not rely on type 
information, lettype on the other hand would most likely 
force every phase to maintain at the very least a type envi- 
ronment. 

Finally, our implementation is straightforward since it 
relies on well understood techniques and it does not suffer 
from hidden costs since all the hash-consing and memoizing 
is done once and for all. 

9 Conclusions 

Implementing typed intermediate languages is not a trivial 
task. In this paper, we have presented a series of novel 
techniques that make type-preserving compilers practical 
and scalable. We argue that a type-preserving compiler 
will not scale to handle large types unless all of its type- 
preserving stages preserve the asymptotic time and space 
usage in representing and manipulating types. We believe 
what we learned from our implementation will be valuable 
to future implementations of other emerging typed interme- 
diate languages. 

Availability 

The implementation discussed in this paper is now released 
with the Standard ML of New Jersey (SML/NJ) compiler 
and the FLINT/ML compiler [35]. SML/NJ is a joint work 
by Lucent, Princeton, Yale and AT&T. FLINT is a modern 
compiler infrastructure developed at Yale LTniversity. Both 
FLINT and SML/NJ are available from the following web 
site: 

http : //flint . cs . yale . edu 
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